Nested Chinese Restaurant Franchise Processes: Applications to User Tracking and Document Modeling

نویسندگان

  • Amr Ahmed
  • Alexander J. Smola
چکیده

Much natural data is hierarchical in nature. Moreover, this hierarchy is often shared between different instances. We introduce the nested Chinese Restaurant Franchise Process to obtain both hierarchical tree-structured representations for objects, akin to (but more general than) the nested Chinese Restaurant Process while sharing their structure akin to the Hierarchical Dirichlet Process. Moreover, by decoupling the structure generating part of the process from the components responsible for the observations, we are able to apply the same statistical approach to a variety of user generated data. In particular, we model the joint distribution of microblogs and locations for Twitter for users. This leads to a 40% reduction in location uncertainty relative to the best previously published results. Moreover, we model documents from the NIPS papers dataset, obtaining excellent perplexity relative to (hierarchical) Pachinko allocation and LDA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling

Much natural data is hierarchical in nature. Moreover, this hierarchy is often shared between different instances. We introduce the nested Chinese Restaurant Franchise Process to obtain both hierarchical tree-structured representations for objects, akin to (but more general than) the nested Chinese Restaurant Process while sharing their structure akin to the Hierarchical Dirichlet Process. More...

متن کامل

Nested Hierarchical Dirichlet Process for Nonparametric Entity-Topic Analysis

The Hierarchical Dirichlet Process (HDP) is a Bayesian nonparametric prior for grouped data, such as collections of documents, where each group is a mixture of a set of shared mixture densities, or topics, where the number of topics is not fixed, but grows with data size. The Nested Dirichlet Process (NDP) builds on the HDP to cluster the documents, but allowing them to choose only from a set o...

متن کامل

Nested Hierarchical Dirichlet Processes for Multi-Level Non-Parametric Admixture Modeling

Dirichlet Process(DP) is a Bayesian non-parametric prior for infinite mixture modeling, where the number of mixture components grows with the number of data items. The Hierarchical Dirichlet Process (HDP), often used for non-parametric topic modeling, is an extension of DP for grouped data, where each group is a mixture over shared mixture densities. The Nested Dirichlet Process (nDP), on the o...

متن کامل

Nonparametric Topic Modeling Using Chinese Restaurant Franchise with Buddy Customers

Many popular latent topic models for text documents generally make two assumptions. The first assumption relates to a finitedimensional parameter space. The second assumption is the bag-of-words assumption, restricting such models to capture the interdependence between the words. While existing nonparametric admixture models relax the first assumption, they still impose the second assumption me...

متن کامل

The Nested Chinese Restaurant Process and Hierarchical Topic Models

We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a nonparametric Bayesian model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013